This tidy data set contains 1,599 red wines with 11 variables on the chemical properties of the wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).
The dataset is related to red variant of the Portuguese “Vinho Verde” wine.
For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009].
Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
Atributes (based on physicochemical tests):
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3)
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)
Atribute (based on sensory data):
12 - quality (score between 0 and 10)
Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines
4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
5 - chlorides: the amount of salt in the wine
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 oncentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
11 - alcohol: the percent alcohol content of the wine
12 - quality (score between 0 and 10)
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
We add to the data set the relationship between volatile and fixed acidity to analyze it later.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01348 0.04405 0.06569 0.06706 0.08581 0.20800
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
##
## 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14
## 132 33 50 30 29 20 24 22 33 30 35 15 27 18 21
## 0.15 0.16 0.17 0.18 0.19 0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
## 19 9 16 22 21 25 33 27 25 51 27 38 20 19 21
## 0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44
## 30 30 32 25 24 13 20 19 14 28 29 16 29 15 23
## 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
## 22 19 18 23 68 20 13 17 14 13 12 8 9 9 8
## 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74
## 9 2 1 10 9 7 14 2 11 4 2 1 1 3 4
## 0.75 0.76 0.78 0.79 1
## 1 3 1 1 1
The tendency of citric acid is to decrease to 0.75 but three anomalies can be observed. Many wines with value 0 or practically 0, others many with 0.49 and an extreme value of 1 that could be an erroneous value or a wine that looks for a very fruity flavor.
We add to the data set the relationship between citric acid and fixed acidity to analyze it later.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.01292 0.03291 0.03084 0.04503 0.13929
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
Using a logarithmic axis we can observe a standard normal curve
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 837 837 6.7 0.28 0.28 2.4
## 838 838 6.7 0.28 0.28 2.4
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 837 0.012 36 100 0.99064 3.26
## 838 0.012 36 100 0.99064 3.26
## sulphates alcohol quality relative.volatile.acidity
## 837 0.39 11.7 7 0.04179104
## 838 0.39 11.7 7 0.04179104
## relative.citric.acid
## 837 0.04179104
## 838 0.04179104
It seems that the use of salt is very centralized (a low typical deviation) but many samples with much higher values are observed. Maybe to counteract other flavors? Perhaps sugar?
There is also some sample with practically no value. Maybe it’s a mistake or an exceptional wine that seeks this characteristic.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 1080 1080 7.9 0.3 0.68 8.3
## 1082 1082 7.9 0.3 0.68 8.3
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 1080 0.05 37.5 278 0.99316 3.01
## 1082 0.05 37.5 289 0.99316 3.01
## sulphates alcohol quality relative.volatile.acidity
## 1080 0.51 12.3 7 0.03797468
## 1082 0.51 12.3 7 0.03797468
## relative.citric.acid
## 1080 0.08607595
## 1082 0.08607595
Some extreme value can be observed. We might think that maybe they are low quality wines but no. They have a value of 7/10
We add to the data set the relationship between free and total sulfur dioxide to analyze it later.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02273 0.25926 0.37500 0.38231 0.48485 0.85714
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0037
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 87 87 8.6 0.49 0.28 1.9
## 92 92 8.6 0.49 0.28 1.9
## 93 93 8.6 0.49 0.29 2.0
## 152 152 9.2 0.52 1.00 3.4
## 364 364 12.5 0.46 0.63 2.0
## 441 441 12.6 0.31 0.72 2.2
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 87 0.110 20 136 0.9972 2.93
## 92 0.110 20 136 0.9972 2.93
## 93 0.110 19 133 0.9972 2.93
## 152 0.610 32 69 0.9996 2.74
## 364 0.071 6 15 0.9988 2.99
## 441 0.072 6 29 0.9987 2.88
## sulphates alcohol quality relative.volatile.acidity
## 87 1.95 9.9 6 0.05697674
## 92 1.95 9.9 6 0.05697674
## 93 1.98 9.8 5 0.05697674
## 152 2.00 9.4 4 0.05652174
## 364 0.87 10.2 5 0.03680000
## 441 0.82 9.8 8 0.02460317
## relative.citric.acid relative.sulfur.dioxide
## 87 0.03255814 0.1470588
## 92 0.03255814 0.1470588
## 93 0.03372093 0.1428571
## 152 0.10869565 0.4637681
## 364 0.05040000 0.4000000
## 441 0.05714286 0.2068966
We could think that wines with a pH out of 3-4 are not very good wines. We can observe that they have ratings between 4-6 and curiously there is one with very acid pH 2.88 that has a rating of 8!
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 14 14 7.8 0.610 0.29 1.6
## 87 87 8.6 0.490 0.28 1.9
## 92 92 8.6 0.490 0.28 1.9
## 93 93 8.6 0.490 0.29 2.0
## 152 152 9.2 0.520 1.00 3.4
## 170 170 7.5 0.705 0.24 1.8
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 14 0.114 9 29 0.9974 3.26
## 87 0.110 20 136 0.9972 2.93
## 92 0.110 20 136 0.9972 2.93
## 93 0.110 19 133 0.9972 2.93
## 152 0.610 32 69 0.9996 2.74
## 170 0.360 15 63 0.9964 3.00
## sulphates alcohol quality relative.volatile.acidity
## 14 1.56 9.1 5 0.07820513
## 87 1.95 9.9 6 0.05697674
## 92 1.95 9.9 6 0.05697674
## 93 1.98 9.8 5 0.05697674
## 152 2.00 9.4 4 0.05652174
## 170 1.59 9.5 5 0.09400000
## relative.citric.acid relative.sulfur.dioxide
## 14 0.03717949 0.3103448
## 87 0.03255814 0.1470588
## 92 0.03255814 0.1470588
## 93 0.03372093 0.1428571
## 152 0.10869565 0.4637681
## 170 0.03200000 0.2380952
We can observe that wines with many sulfates are very acidic. We will check further if there is a relationship between these two variables.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
## Low Medium High
## 63 1319 217
Most of the ratings are 5-6. Very few wines get extreme values of 3 or 8. I would expect to find some 9 or 10. Perhaps the Portuguese wines do not reach such a high quality or perhaps they have not entered into this study because of their price.
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 268 268 7.9 0.35 0.46 3.6
## 279 279 10.3 0.32 0.45 6.4
## 391 391 5.6 0.85 0.05 1.4
## 441 441 12.6 0.31 0.72 2.2
## 456 456 11.3 0.62 0.67 5.2
## 482 482 9.4 0.30 0.56 2.8
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 268 0.078 15 37 0.9973 3.35
## 279 0.073 5 13 0.9976 3.23
## 391 0.045 12 88 0.9924 3.56
## 441 0.072 6 29 0.9987 2.88
## 456 0.086 6 19 0.9988 3.22
## 482 0.080 6 17 0.9964 3.15
## sulphates alcohol quality relative.volatile.acidity
## 268 0.86 12.8 8 0.04430380
## 279 0.82 12.6 8 0.03106796
## 391 0.82 12.9 8 0.15178571
## 441 0.82 9.8 8 0.02460317
## 456 0.69 13.4 8 0.05486726
## 482 0.92 11.7 8 0.03191489
## relative.citric.acid relative.sulfur.dioxide quality.ranges
## 268 0.058227848 0.4054054 High
## 279 0.043689320 0.3846154 High
## 391 0.008928571 0.1363636 High
## 441 0.057142857 0.2068966 High
## 456 0.059292035 0.3157895 High
## 482 0.059574468 0.3529412 High
It seems that the wines with the best rating have a high degree of alcohol.
## X fixed.acidity volatile.acidity citric.acid residual.sugar
## 460 460 11.6 0.580 0.66 2.20
## 518 518 10.4 0.610 0.49 2.10
## 691 691 7.4 1.185 0.00 4.25
## 833 833 10.4 0.440 0.42 1.50
## 900 900 8.3 1.020 0.02 3.40
## 1300 1300 7.6 1.580 0.00 2.10
## chlorides free.sulfur.dioxide total.sulfur.dioxide density pH
## 460 0.074 10 47 1.00080 3.25
## 518 0.200 5 16 0.99940 3.16
## 691 0.097 5 14 0.99660 3.63
## 833 0.145 34 48 0.99832 3.38
## 900 0.084 6 11 0.99892 3.48
## 1300 0.137 5 9 0.99476 3.50
## sulphates alcohol quality relative.volatile.acidity
## 460 0.57 9.0 3 0.05000000
## 518 0.63 8.4 3 0.05865385
## 691 0.54 10.7 3 0.16013514
## 833 0.86 9.9 3 0.04230769
## 900 0.49 11.0 3 0.12289157
## 1300 0.40 10.9 3 0.20789474
## relative.citric.acid relative.sulfur.dioxide quality.ranges
## 460 0.056896552 0.2127660 Low
## 518 0.047115385 0.3125000 Low
## 691 0.000000000 0.3571429 Low
## 833 0.040384615 0.7083333 Low
## 900 0.002409639 0.5454545 Low
## 1300 0.000000000 0.5555556 Low
At first glance, there is no generality in its attributes that can be related to a low score in valuation.
There are 1599 wines in the dataset with 12 features:
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3)
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)
12 - quality (score between 0 and 10)
The first 11 attributes are numerical and the last one (Quality) is categorical.
Observations:
* Most quality are 5-6. * Quality of 8 have high valueos of alcohol. * The median of alcohol is around 10%. * Almost all wines have pH between 3-4. * The density of wine is a little lower than water.
The main features in the data set are pH, alcohol and quality. Although surely all the attributes are representative in the flavor of the wine. Wine is a very complex product that involves many factors that derive in its quality.
the wine. I think extreme values of the flavours break the balance of flavors on the palate.
I created the following additional variables:
1. Relative Volatile Acidity respect Fixed Acidity. 2. Relative Citric Acid respect Fixed Acidity. 3. Relative Free Sulfur Dioxide respect Total Sulfur Dioxide. 4. Quality Ranges of Quality to Low, Medium and High values.
Most distribution are normal or right skewed.
Acid citric distribution appears bimodal with two peaks around 0 and 0.49.
In addition to including additional variables, it has not been necessary to perform any additional operations.
TODO REVISAR
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We can found the following correlations:
1. Quality and alcohol (0.476) 2. Quality and volatile acidity (-0.391) 3. sulphates and citric acid (0.313) 4. ph and fixed acidity (-0.683) 5. ph and citrix acid (-0.542)
Other relationships that I want to check are: 1. Quality and relative sulfure 2. Quality and citric acid 3. Quality and fixed acidity 4. Quality and pH 5. Quality with residual sugar 6. Quality and chlorides
##
## Pearson's product-moment correlation
##
## data: fixed.acidity and volatile.acidity
## t = -10.589, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3013681 -0.2097433
## sample estimates:
## cor
## -0.2561309
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
We can observe a slight tendency to reduce the volatile acidity by increasing the fixed.
##
## Pearson's product-moment correlation
##
## data: fixed.acidity and citric.acid
## t = 36.234, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6438839 0.6977493
## sample estimates:
## cor
## 0.6717034
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
We can observe a tendency to increse the fixed acidity by increasing the citric acid.
We can summary respect the acidity that the citric acid increase the fixed acidity and an increase in this reduces the volatility.
##
## Pearson's product-moment correlation
##
## data: total.sulfur.dioxide and free.sulfur.dioxide
## t = 35.84, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6395786 0.6939740
## sample estimates:
## cor
## 0.6676665
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
We can observe a tendency to increse the free sulfur dioxide by increasing the total sulfur dioxide. The increase dims as the total increases.
##
## Pearson's product-moment correlation
##
## data: sulphates and citric.acid
## t = 13.159, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2678558 0.3563278
## sample estimates:
## cor
## 0.31277
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 44 rows containing non-finite values (stat_smooth).
## Warning: Removed 44 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
It is observed that there is a slight relationship between the increase of sulphates and the increase of citric acid.
##
## Pearson's product-moment correlation
##
## data: pH and fixed.acidity
## t = -37.366, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7082857 -0.6559174
## sample estimates:
## cor
## -0.6829782
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 45 rows containing non-finite values (stat_smooth).
## Warning: Removed 45 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
As expected, increasing the acidity decreases the pH to more acid.
##
## Pearson's product-moment correlation
##
## data: pH and citric.acid
## t = -25.767, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5756337 -0.5063336
## sample estimates:
## cor
## -0.5419041
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 43 rows containing non-finite values (stat_smooth).
## Warning: Removed 43 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
As expected, as in the previous case, increasing citric acid decreases the pH to more acid.
##
## Pearson's product-moment correlation
##
## data: quality and pH
## t = -2.3109, df = 1597, p-value = 0.02096
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.106451268 -0.008734972
## sample estimates:
## cor
## -0.05773139
## Warning: Removed 30 rows containing non-finite values (stat_summary).
## Warning: Removed 32 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
The quality and the pH are not related (-0.06) although it can be observed in the boxplot that the average of the pH of the wines of more quality slightly more acid than in the rest.
##
## Pearson's product-moment correlation
##
## data: quality and volatile.acidity
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4313210 -0.3482032
## sample estimates:
## cor
## -0.3905578
## Warning: Removed 30 rows containing non-finite values (stat_summary).
## Warning: Removed 33 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
As observed with the correlation value (-0.39) the volatile acid under improves the quality of the wine.
##
## Pearson's product-moment correlation
##
## data: quality and alcohol
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
## Warning: Removed 21 rows containing non-finite values (stat_summary).
## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
##
## Pearson's product-moment correlation
##
## data: quality and fixed.acidity
## t = 4.996, df = 1597, p-value = 6.496e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07548957 0.17202667
## sample estimates:
## cor
## 0.1240516
## Warning: Removed 27 rows containing non-finite values (stat_summary).
## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
The fixed acidity positively influences the quality of the wine, unlike the volatilite acid that affects in an inverse way.
Maybe it’s because the acidity in the mouth is positive but not in the nose.
##
## Pearson's product-moment correlation
##
## data: quality and residual.sugar
## t = 0.5488, df = 1597, p-value = 0.5832
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03531327 0.06271056
## sample estimates:
## cor
## 0.01373164
## Warning: Removed 31 rows containing non-finite values (stat_summary).
## Warning: Removed 46 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
From what we observe in the graphs in the correlation, there is no relation between quality and sugar.
##
## Pearson's product-moment correlation
##
## data: quality and chlorides
## t = -5.1948, df = 1597, p-value = 2.313e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.17681041 -0.08039344
## sample estimates:
## cor
## -0.1289066
## Warning: Removed 32 rows containing non-finite values (stat_summary).
## Warning: Removed 33 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
From what we observe in the graphs and in the correlation there is a slight tendency to improve the quality of the wine with the decrease in chlorides
##
## Pearson's product-moment correlation
##
## data: alcohol and density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5322547 -0.4583061
## sample estimates:
## cor
## -0.4961798
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 41 rows containing non-finite values (stat_smooth).
## Warning: Removed 59 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
There is a clear relationship between alcohol and density. Which was expected since the alcohol is less dense than water and therefore its increase reduces the density of the wine.
more acids (low pH and high fixed acidity) and with low volatile acidity. It is likely that everything that can bring flavor to the wine improves the value of it but without exaggerating the smells of it. There are a strong relationship between quality and alcohol (0.48). Surely this is because wines with more maturity tend to have more quality. And the more time in maturation, the longer it takes to ferment the sugar in alcohol.
It would have been very interesting if we had in the data set information about the maturation time of the wine.
relationship between volatile and fixed acid, citric acid and fixed acid, free and total SO2, and citric acid and sulfates. In all cases, this was the case except in the volatile and fixed rate in which volatility decreases with the increase in fixed acidity.
The strongest relationship found is between alcohol and density (-0.5) followed by alcohol and quality (0.48).
As already anticipated, fixed acidity is inversely related to volatile acidity but directly to citric acid.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
There is no relationship between sulphates and sulphides, although there is a relationship between free and total sulfur, as previously noted.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
The tendency between acidity and pH in any type of quality is clear, and the presence of higher concentrations of alcohol in high quality wines is clear.
The tendency to improve wine quality by decreasing volatile acidity and chlorides is also clear.
##
## Calls:
## m1: lm(formula = quality ~ alcohol, data = wqr)
## m2: lm(formula = quality ~ alcohol + pH, data = wqr)
## m3: lm(formula = quality ~ alcohol + pH + fixed.acidity, data = wqr)
## m4: lm(formula = quality ~ alcohol + pH + fixed.acidity + volatile.acidity,
## data = wqr)
## m5: lm(formula = quality ~ alcohol + pH + fixed.acidity + volatile.acidity +
## chlorides, data = wqr)
##
## ==========================================================================================
## m1 m2 m3 m4 m5
## ------------------------------------------------------------------------------------------
## (Intercept) 1.875*** 4.426*** 3.132*** 3.588*** 3.925***
## (0.175) (0.387) (0.598) (0.571) (0.603)
## alcohol 0.361*** 0.386*** 0.381*** 0.328*** 0.324***
## (0.017) (0.017) (0.017) (0.017) (0.017)
## pH -0.850*** -0.541*** -0.264 -0.333*
## (0.116) (0.159) (0.153) (0.158)
## fixed.acidity 0.039** 0.021 0.018
## (0.014) (0.013) (0.013)
## volatile.acidity -1.262*** -1.248***
## (0.100) (0.100)
## chlorides -0.652
## (0.375)
## ------------------------------------------------------------------------------------------
## R-squared 0.227 0.252 0.256 0.324 0.325
## adj. R-squared 0.226 0.251 0.254 0.322 0.323
## sigma 0.710 0.699 0.697 0.665 0.665
## F 468.267 268.888 182.731 190.776 153.418
## p 0.000 0.000 0.000 0.000 0.000
## Log-likelihood -1721.057 -1694.466 -1690.443 -1613.880 -1612.366
## Deviance 805.870 779.508 775.596 704.768 703.435
## AIC 3448.114 3396.931 3390.886 3239.760 3238.733
## BIC 3464.245 3418.440 3417.772 3272.022 3276.373
## N 1599 1599 1599 1599 1599
## ==========================================================================================
and a tendency to low volatile acidity and fixed acidity when increasing the quality of the wine.
There is no relationship between sulphates and sulphides, although there is a relationship between free and total sulfur, as previously noted.
The tendency to improve wine quality by decreasing volatile acidity and chlorides is also clear.
In general I am surprised by the not very strong relationship between the attributes of wine and its evaluation. In some cases, it does not seem to influence the residual sugar or SO2.
relationships between the subjective qualite value are not strongs with the attributes of the wines.
The Fixed acidity is inversely related to volatile acidity but directly to citric acid. There is clearly a low presence of citric acid in the samples of low quality and a tendency to low volatile acidity and fixed acidity when increasing the quality of the wine.
## Warning: Removed 131 rows containing non-finite values (stat_summary).
## Warning: Removed 177 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
There are a strong relationship between quality and alcohol (0.48). Surely this is because wines with more maturity tend to have more quality. And the more time in maturation, the longer it takes to ferment the sugar in alcohol.
It would have been very interesting if we had in the data set information about the maturation time of the wine.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 231 rows containing non-finite values (stat_smooth).
## Warning: Removed 268 rows containing missing values (geom_point).
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?
The most clear relationship in this dataset is between alcohol and density (-0.5). Which was expected since the alcohol is less dense than water and therefore its increase reduces the density of the wine.
In conclusion we can say that the factor that contributes most to the quality of the wine (subjective evaluation of the jury) is the time of maturation of the wine. The time is not among the attributes of the dataset but it is deduced by the amount of wine alcohol that is the result of the maturation process which requires a lot of time.
Other factors that positively influence the quality ofthe wine are all the nuances in the taste and smell that come from the balance of acids found in it.
The acid aroma in the aroma of the wine decreases the value of it as observed in the study.
The biggest difficulty I have found in this dataset is to find clear relationships between the attributes. A quality wine is full of flavors, nuances, textures and smells. And each of these characteristics will be reflected differently in the chemical attributes of the wine.
It is clear that it is complicated to be an enologist!
In part, the not very strong relationship between attributes and valuation have led to the fact that the linear prediction model is obviously not accurate.
It would have been very interesting to have data such as the maturity of the wine, the type of process it has carried, the type of grape, type of barrel that has been used, etc. The winemaking process is very complex and any factor can vary the final result.